Search Result

Select

Handwritten English text recognition based on convolutional neural network and Transformer

Xianjie ZHANG, Zhiming ZHANG

Journal of Computer Applications 2022, 42 (8): 2394-2400. DOI: 10.11772/j.issn.1001-9081.2021091564

Abstract （554）

HTML （55）

PDF （703KB）（315）

Save

Handwritten text recognition technology can transcribe handwritten documents into editable digital documents. However， due to the problems of different writing styles， ever-changing document structures and low accuracy of character segmentation recognition， handwritten English text recognition based on neural networks still faces many challenges. To solve the above problems， a handwritten English text recognition model based on Convolutional Neural Network （CNN） and Transformer was proposed. Firstly， CNN was used to extract features from the input image. Then， the features were input into the Transformer encoder to obtain the prediction of each frame of the feature sequence. Finally， the Connectionist Temporal Classification （CTC） decoder was used to obtain the final prediction result. A large number of experiments were conducted on the public Institut für Angewandte Mathematik （IAM） handwritten English word dataset. Experimental results show that this model obtains a Character Error Rate （CER） of 3.60% and a Word Error Rate （WER） of 12.70%， which verify the feasibility of the proposed model.

Table and Figures | Reference | Related Articles | Metrics

Select

UWB-VIO integrated indoor positioning algorithm for mobile robots

Bingqi SHEN, Zhiming ZHANG, Shaolong SHU

Journal of Computer Applications 2022, 42 (12): 3924-3930. DOI: 10.11772/j.issn.1001-9081.2021101778

Abstract （450）

HTML （7）

PDF （2499KB）（189）

Save

For the positioning task of mobile robots in indoor environment， the emerging auxiliary positioning technology based on Visual Inertial Odometry （VIO） is heavily limited by the light conditions and cannot works in the dark environment. And Ultra-Wide Band （UWB）-based positioning methods are easily affected by Non-Line Of Sight （NLOS） error. To solve the above problems， an indoor mobile robot positioning algorithm based on the combination of UWB and VIO was proposed. Firstly， S-MSCKF （Stereo-Multi-State Constraint Kalman Filter） algorithm/DS-TWR （Double Side-Two Way Ranging） algorithm and trilateral positioning method were used to obtain the position information of VIO output/positioning information resolved by UWB respectively. Then， the motion equation and observation equation of the position measurement system were established. Finally， the optimal position estimation of the robot was obtained by data fusion carried out using Error State-Extended Kalman Filter （ES-EKF） algorithm. The built mobile positioning platform was used to verify the combined positioning method in different indoor environments. Experimental results show that in the indoor environment with obstacles， the proposed algorithm can reduce the maximum error of overall positioning by about 4.4% and the mean square error of overall positioning by about 6.3% compared with the positioning method only using UWB， and reduce the maximum error of overall positioning by about 31.5% and the mean square error of overall positioning by about 60.3% compared with the positioning method using VIO. It can be seen that the proposed algorithm can provide real-time， accurate and robust positioning results for mobile robots in indoor environment.

Table and Figures | Reference | Related Articles | Metrics

Select

Visual simultaneous localization and mapping based on semantic and optical flow constraints in dynamic scenes

Hao FU, Hegen XU, Zhiming ZHANG, Shaohua QI

Journal of Computer Applications 2021, 41 (11): 3337-3344. DOI: 10.11772/j.issn.1001-9081.2021010003

Abstract （369）

HTML （7）

PDF （2125KB）（223）

Save

For the localization and static semantic mapping problems in dynamic scenes， a Simultaneous Localization And Mapping （SLAM） algorithm in dynamic scenes based on semantic and optical flow constraints was proposed to reduce the impact of moving objects on localization and mapping. Firstly， for each frame of the input， the masks of the objects in the frame were obtained by semantic segmentation， then the feature points that do not meet the epipolar constraint were filtered out by the geometric method. Secondly， the dynamic probability of each object was calculated by combining the object masks with the optical flow， the feature points were filtered by the dynamic probabilities to obtain the static feature points， and the static feature points were used for the subsequent camera pose estimation. Then， the static point cloud was created based on RGB-D images and object dynamic probabilities， and the semantic octree map was built by combining the semantic segmentation. Finally， the sparse semantic map was created based on the static point cloud and the semantic segmentation. Test results on the public TUM dataset show that， in highly dynamic scenes， the proposed algorithm improves the performance on both the absolute trajectory error and relative pose error by more than 95% compared with ORB-SLAM2， and reduces the absolute trajectory error by 41% and 11% compared with DS-SLAM and DynaSLAM respectively， which verifies that the proposed algorithm has better localization accuracy and robustness in highly dynamic scenes. The experimental results of mapping show that the proposed algorithm creates a static semantic map， and the storage space requirement of the sparse semantic map is reduced by 99% compared to that of the point cloud map.

Table and Figures | Reference | Related Articles | Metrics